Intensity of Relationship Between Words: Using Word Triangles in Topic Discovery for Short Texts
نویسندگان
چکیده
Uncovering latent topics from given texts is an important task to help people understand excess heavy information. This has caused the hot study on topic model. However, the main texts available daily are short, thus traditional topic models may not perform well because of data sparsity. Popular models for short texts concentrate on word co-occurrence patterns in the corpus. However, they do not consider the intensity of relationship between words. So we propose the new way, called word-network triangle topic model (WTTM). In WTTM, we search for the word triangles to measure the relations between words. The results of experiments on real-world corpus show that our method performs better in several evaluation ways.
منابع مشابه
EFL Textbook Evaluation: An Analysis of Readability and Vocabulary Profiler of Four Corners Book Series
This study aimed to investigate whether there is any significant relationship between the readability and vocabulary profile including the most frequent words (K1 words) and academic word list (AWL) of reading passages of Four Corners series which were EFL textbooks. To determine the readability of the texts, the Flesch–Kincaid (1975) readability test was used, while the texts' academic word li...
متن کاملEFL Textbook Evaluation: An Analysis of Readability and Vocabulary Profiler of Four Corners Book Series
This study aimed to investigate whether there is any significant relationship between the readability and vocabulary profile including the most frequent words (K1 words) and academic word list (AWL) of reading passages of Four Corners series which were EFL textbooks. To determine the readability of the texts, the Flesch–Kincaid (1975) readability test was used, while the texts' academic word li...
متن کاملVoice in Short Argumentative Texts Written by Undergraduate Learners of English
The present study explored the intensity level of authorial voice in relation to the quality of argumentative writing. 42 undergraduate learners of English as a foreign language (36 girls and 6 boys) spent 45 minutes to individually complete in-class position-taking writing tasks for three weeks. Their overall academic writing quality scores assigned based on portfolio assessment were studied i...
متن کاملUnsupervised Topic Modeling for Short Texts Using Distributed Representations of Words
We present an unsupervised topic model for short texts that performs soft clustering over distributed representations of words. We model the low-dimensional semantic vector space represented by the dense distributed representations of words using Gaussian mixture models (GMMs) whose components capture the notion of latent topics. While conventional topic modeling schemes such as probabilistic l...
متن کاملTopic Modeling over Short Texts by Incorporating Word Embeddings
Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks, such as content charactering, user interest profiling, and emerging topic detecting. Existing methods such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) cannot solve this problem very well since only very limited word co-o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017